---
title: Components
---
import { Aside } from '@astrojs/starlight/components';

Orama can be completely customized and extended by using its components architecture.

Depending on the case, a component can be a simple as a function or a slightly more complex interface. All components can be
synchronous or return a promise and Orama will make sure everything is handled correctly.

When no components are specified, an Orama database is created with some defaults components which satisfies most common use cases:

- English tokenizer with stemming disabled.
- BM25 and Radix-Tree based index.
- In memory documents store.

## Providing your own components

It's very easy to provide a custom component when creating a database: simply pass the `components` option when calling `create`.

For instance, this code:

```javascript
import { create, insert, search } from "@orama/orama";

const movieDB = await create({
  schema: {
    title: "string",
    director: "string",
    plot: "string",
    year: "number",
    isFavorite: "boolean",
  },
  components: {
    afterInsert() {
      console.log("INSERTED");
    },
  },
});

await insert(movieDB, {
  title: "Harry Potter and the Philosopher's Stone",
  director: "Chris Columbus",
  plot: "Harry Potter, an eleven-year-old orphan, discovers that he is a wizard and is invited to study at Hogwarts. Even as he escapes a dreary life and enters a world of magic, he finds trouble awaiting him.",
  year: 2001,
  isFavorite: false,
});

const results = await search(movieDB, { term: "Harry" });
console.log(results.count);
```

Will lead to this output:

```
INSERTED
2
```

## Supported components

### `tokenizer`

The tokenizer is used to tokenize documents fields and search terms. To customize the tokenizer used by Orama, provide an object which has at least the following properties:

- `tokenize`: A function that accepts the content to tokenize (string), the language (string) and the property name (string) and returns a list of tokens.
- `language` (string): The language supported by the tokenizer.
- `normalizationCache` (Map): It can used to cache tokens normalization.

<Aside type="caution">The `tokenize` function **CANNOT** be async.</Aside>

In other words, a tokenizer must satisfy the following interface:

```typescript
interface Tokenizer {
  language: string;
  normalizationCache: Map<string, string>;
  tokenize: (
    raw: string,
    language?: string,
    prop?: string
  ) => string[] | Promise<string[]>;
}
```

For instance, with the following configuration only the first character of each string will be indexed and only the first character of a term will be searched:

```javascript
import { create } from "@orama/orama";

const movieDB = await create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    tokenizer: {
      language: "english",
      normalizationCache: new Map(),
      tokenize(raw) {
        return raw[0];
      },
    },
  },
});
```

The Orama's default tokenizer is exported via `@orama/orama/components` and can be customized:

```javascript
import { create } from "@orama/orama";
import { tokenizer as defaultTokenizer } from "@orama/orama/components";

const movieDB = await create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    tokenizer: await defaultTokenizer.createTokenizer({
      language: "english",
      stemming: false,
    }),
  },
});
```

Optionally you can pass the customization options without using `createTokenizer`:

```javascript
import { create } from "@orama/orama";

const movieDB = await create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    tokenizer: { language: "english", stemming: false },
  },
});
```

### `index`

The index component is used to perform the indexing and searching of documents in Orama.

To customize the index used by Orama, provide an object which has at least the following properties:

- `create`: A function that creates a new index. It receives the following arguments:
  - `orama`: The Orama instance.
  - `mapper`: The document IDs mapper (see <a href="#internal-components">Internal components</a> section below).
  - `schema`: The documents schema.
- `insert`: A function that inserts a new document in the index. It receives the following arguments:
  - `implementation`: The current index implementation.
  - `index`: The index.
  - `prop`: The property that it is currently considered.
  - `id`: The ID of the document being inserted.
  - `value`: The value of the property in the document.
  - `expectedType`: The type of the property in the document according with the schema.
  - `language`: The language of the document.
  - `tokenizer`: The tokenizer associated with the current database.
  - `docsCount`: The number of documents in the documents store before the action is performed.
- `remove`: A function that removes a document from the index. It receives the same arguments as `insert`.
- `insertDocumentScoreParameters`: A function that inserts document information into the index for future results score calculation. It should be typically invoked within `insert`. It receives the following arguments:
  - `index`: The index.
  - `prop`: The property that is currently considered.
  - `id`: The ID of the document being inserted.
  - `tokens`: The list of the tokens found in the document.
  - `docsCount`: The number of documents in the documents store before the action is performed.
- `insertTokenScoreParameters`: A function that inserts token information into the index for future results score calculation. It should be typically invoked within `insert`. It receives the following arguments:
  - `index`: The index.
  - `prop`: The property that it is currently considered.
  - `id`: The ID of the document being inserted.
  - `token`: The token.
  - `tokens`: The list of the tokens found in the document.
- `removeDocumentScoreParameters`: A function that removes document scores information from the index. It should be typically invoked within `remove`. It receives the following arguments:
  - `index`: The index.
  - `prop`: The property that is currently considered.
  - `id`: The ID of the document being inserted.
  - `docsCount`: The number of documents in the documents store before the action is performed.
- `removeTokenScoreParameters`: A function that removes token score information from the index. It should be typically invoked within `remove`. It receives the following arguments:
  - `index`: The index.
  - `prop`: The property that is currently considered.
  - `id`: The ID of the document being inserted.
  - `token`: The token.
- `calculateResultScores`: A function that calculates the score for the results of the current search. It should be typically invoked within `search`. It receives the following arguments:
  - `context`: A search context with various useful information about the search.
  - `index`: The index.
  - `prop`: The property search.
  - `term`: The term used to search.
  - `ids`: The list of document IDs matched by the search.
- `search`: A function that searches documents in index data and returns matching IDs with scores. It receives the following arguments:
  - `context`: A search context with various useful information about the search.
  - `index`: The index.
  - `prop`: The property to search into.
  - `term`: The term to search for.
- `searchByWhereClause`: A function that searches in boolean and numeric indexes and returns a list of matching IDs. It receives the following arguments:
  - `context`: A search context with various useful information about the search.
  - `index`: The index.
  - `filters`: An object where keys are the properties to match and the values are search operators as described in the [filters](/open-source/usage/search/filters) page.
- `getSearchableProperties`: A function that returns a list of all searchable properties in the index. It receives the index as the only argument.
- `getSearchablePropertiesWithTypes`: A function that returns an object where keys are the searchable properties in the index and the values are the type of the index for a property. It receives the index as the only argument.
- `load`: A function that deserializes an index from a JavaScript object. It receives The document IDs mapper and a JavaScript object as its only argument and must return an index.
- `save`: A function that serializes the index into a JavaScript object. It receives the index as the only argument and must return a JavaScript object.

The following functions are optional:

- `beforeInsert` or `afterInsert`: Functions invoked before or after `insert`. They accept the same arguments as `insert` except the first one.
- `beforeRemove` or `afterRemove`: Functions invoked before or after `remove`. They accept the same arguments as `remove` except the first one.

For the more formal interface information, look for the `IIndex` interface in `src/types.ts` in Orama's source code.

The Orama's default index is based on BM25, Radix Trees and AVL trees. All its functions are exported via `@orama/orama/components` and can be composed to create a custom index:

```javascript
import { create } from "@orama/orama";
import { index as defaultIndex } from "@orama/orama/components";

const index = await defaultIndex.createIndex();
const movieDB = await create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    // This index will only customize the deserialization
    index: {
      ...index,
      load(documentsIdsMapper, raw) {
        // Do something here
      },
    },
  },
});
```

### `documentsStore`

The documentsStore component is used to store the documents in Orama.

To customize the documents store used by Orama, provide an object which has at least the following properties:

- `create`: A function that creates a new document store. It receives the following arguments:
  - `orama`: The Orama instance.
  - `mapper`: The document IDs mapper (see <a href="#internal-components">Internal components</a> section below).
- `get`: A function that returns a document from the store. It receives the following arguments:
  - The documents store.
  - The ID of the document to get.
- `getAll`: A function that returns all documents from the store. Note that the IDs in the returned object are the mapped IDs from the mapper component. It receives the following arguments:
  - The documents store.
- `getMultiple`: A function that returns multiple documents from the store. It receives the following arguments:
  - The documents store.
  - A list of IDs of the documents to get.
- `getAll`: A function that returns all the documents from the store. It receives the following arguments:
  - The documents store.
- `store`: A function that stores a new document in the documents store. It receives the following arguments:
  - The documents store.
  - The ID of the new document to store.
  - The document to store.
- `remove`: A function that removes a document from the documents store. It receives the following arguments:
  - The documents store.
  - The ID of the new document to remove.
- `count`: A function that returns the count of the documents currently stored. It receives the current documents store as the only argument.
- `load`: A function that deserializes a documents store from a JavaScript object. It receives The document IDs mapper and a JavaScript object as its only argument and must return a documents store.
- `save`: A function that serializes the documents store into a JavaScript object. It receives the current documents store as the only argument and must return a JavaScript object.

For the more formal interface information, look for the `IDocumentsStore` interface in `src/types.ts` in Orama's source code.

The Orama's default documents store is based on simple JavaScript object. All its functions are exported via `@orama/orama/components` and can be composed to create a custom documents store:

```javascript
import { create } from "@orama/orama";
import { documentsStore as defaultDocumentsStore } from "@orama/orama/components";

const store = await defaultDocumentsStore.createDocumentsStore();
const movieDB = await create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    // override partially the default documents store
    documentsStore: {
      ...store,
      remove(s, id) {
        // Apply custom logic
        return store.remove(s, id);
      },
    },
  },
});
```

### `sorter`

The sorter component is used to store the documents in Orama.

To customize the documents sort used by Orama, provide an object which has at least the following properties:

- `create`: A function that creates a new sorter. It receives the following arguments:
  - `mapper`: The document IDs mapper (see <a href="#internal-components">Internal components</a> section below).
  - `schema`: The documents schema.
  - `configuration`: The sorter configuration.
- `insert`: A function that inserts a new document in the sorter. It receives the following arguments:
  - `sorter`: The sorter returned by the `create` function.
  - `prop`: The property that is currently considered.
  - `id`: The ID of the document being inserted.
  - `value`: The value of the property in the document.
  - `schemaType`: The type of the property in the document according with the sort schema.
  - `language`: The language of the document.
- `remove`: A function that removes a document from the index. It receives the following arguments:
  - `sorter`: The sorter returned by the `create` function.
  - `prop`: The property that is currently considered.
  - `id`: The ID of the document being inserted.
- `sortBy`: A function that inserts document information into the index for future results score calculation. It should be typically invoked within `insert`. It receives the following arguments:
  - `sorter`: The sorter returned by the `create` function.
  - `docIds`: A [string, number] array contains for each id the weight.
  - `by`: The SortParameters specified during the search
- `getSortableProperties`: A function that returns a list of all sortable properties in the sorter. It receives the index as the only argument.
- `getSortablePropertiesWithTypes`: A function that returns an object where keys are the sortable properties in the sorter and the values are the type of the sort for a property. It receives the index as the only argument.
- `load`: A function that deserializes a sorter from a JavaScript object. It receives The document IDs mapper and a JavaScript object as its only argument and must return a sorter.
- `save`: A function that serializes the sorter into a JavaScript object. It receives the sorter as the only argument and must return a JavaScript object.

```javascript
import { create } from "@orama/orama";
import { sorter as defaultSorter } from "@orama/orama/components.js";

const s = await defaultSort.createSorter();
const db = await create({
  schema: {
    number: "number",
  },
  components: {
    sorter: {
      // override partially the default sorter
      ...s,
      async remove(sort, prop, id) {
        // Apply custom logic here
        return s.remove(sort, prop, id);
      },
    },
  },
});
```

### General purpose components

The components in this category are simple functions which are internally used by Orama.
Depending on the use case the component must return a value. Orama will await if a Promise is returned.

### `validateSchema`

The component is used to validate a document against the schema.
The function should return `undefined` if the document is valid according to the schema,
the path of the invalid property otherwise.

The function will receive two arguments:

- The document that is being validated.
- The schema provided to `create`.

```javascript
import { create, insert } from "@orama/orama";

const movieDB = await create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    validateSchema(doc) {
      return typeof doc.name === "string" && typeof doc.director === "string";
    },
  },
});

// This will throw
await insert(movieDB, {
  title: "Harry Potter and the Philosopher's Stone",
  director: 42,
});
```

### `getDocumentIndexId`

The component is used to extract or generate a unique ID for a document. The returned value must be string.

The function will receive one argument:

- The document for which an ID is being generated.

```javascript
import { create, insert } from "@orama/orama";

const movieDB = await create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    getDocumentIndexId(doc) {
      return doc.id ?? Date.now().toString();
    },
    afterInsert(_orama, _doc, id) {
      console.log(id);
    },
  },
});

// This will print something like "1679476550629"
await insert(movieDB, {
  title: "Harry Potter and the Philosopher's Stone",
  director: "Chris Columbus",
});
```

### `getDocumentProperties`

The component is used to extract indexable properties from a document.

The function receives two arguments:

- The document that is being read.
- A list of properties paths (using dotted syntax) to extract.

The function must return an object where the keys are the paths received as argument.

```javascript
import { create } from "@orama/orama";
import { get } from "lodash/get";

const movieDB = await create({
  schema: {
    title: "string",
    director: "string",
  },
  components: {
    getDocumentProperties(doc, paths) {
      return Object.fromEntries(
        paths.map((path) => {
          return [path, get(doc, path)];
        })
      );
    },
  },
});
```

### `formatElapsedTime`

The component is used to format the `elapsed` property in the search results. The return value can be a `number`, a `string` or an `object`.

The function receives a single argument: the search elapsed time as BigInt.

```javascript
import { create, insert, search } from "@orama/orama";

const movieDB = await create({
  schema: {
    title: "string",
    director: "string",
    plot: "string",
    year: "number",
    isFavorite: "boolean",
  },
  components: {
    formatElapsedTime(n) {
      return `${Number(n)} - custom`;
    },
  },
});

await insert(movieDB, {
  title: "Harry Potter and the Philosopher's Stone",
  director: "Chris Columbus",
  plot: "Harry Potter, an eleven-year-old orphan, discovers that he is a wizard and is invited to study at Hogwarts. Even as he escapes a dreary life and enters a world of magic, he finds trouble awaiting him.",
  year: 2001,
  isFavorite: false,
});

const results = await search(movieDB, { term: "Harry" });

// This will print something like: 100 - custom
console.log(results.elapsed);
```

## Internal components

### Documents IDs mapper

In order to improve performance, Orama uses an internal ID for each document. The documents IDs mapper component is used to maintain a between the user document ID and the Orama document ID.

This component is internal and cannot be replaced by the developer. The component is passed to the customizable components (like `documentsStore`) and must be treated as an opaque object.

When writing or reading documents in the `data` section of The Orama instance, such as `orama.data.index`, make sure you always use a internal ID.

Orama exports two helpers which will help dealing with this operations:

- `getInternalDocumentId`: A function that receives a documents mapper object and an external document ID and returns an internal document ID.
- `getDocumentIdFromInternalId`: A function that receives a documents mapper object and an internal document ID and returns an external document ID.

## Extending Orama

As Orama is an Open Source Software, we gladly accept proposal for new functionalities.

This obviously include the definition of new components which are used internally by Orama and that can be customized by the users.

### Step 1: Define a new interface

If you want to create a new component, you first have to define your component in the `ObjectComponents` or `FunctionComponents` interfaces in `src/types.ts`.

In case of object components, the definition should be a new interface defined in the same file. As convention, start the interface with the letter `I`.

Example:

```diff
+ interface IShiningDetector {
+   isShining(): SyncOrAsyncValue<boolean>
+ }

export interface ObjectComponents {
  tokenizer: Tokenizer | TokenizerConfig
  index: IIndex
  documentsStore: IDocumentsStore
+ shiningDetector: IShiningDetector
}
```

Remember that all functions used in Orama's components can be async, so we advise to use the `SyncOrAsyncValue` for their return value.
This also implies that when you invoke this function you should always use `await` to make sure the function is correctly handled whether it is async or not.

### Step 2: Define and store data in the database

If your component needs to store some data in the orama database, you have to add a new field in the `Data` interface in `src/types.ts`.

<Aside type="caution">
To avoid compatibility problems, only modify the data in the new field you define.
Any component functions should accept the data as first argument instead of the full database if possible.
</Aside>

Example:

```diff
+ export interface IShiningDetector {
+   isShining(shining: Record<string, number>): SyncOrAsyncValue<number>
+ }

export interface ObjectComponents {
  tokenizer: Tokenizer | TokenizerConfig
  index: IIndex
  documentsStore: IDocumentsStore
+ shiningDetector: IShiningDetector
}

interface Data<I extends OpaqueIndex, D extends OpaqueDocumentStore> {
  index: I
  docs: D
+ shining: Record<string, number>
}
```

### Step 3: Define your component default implementation

Create a file (or folder if appropriate) under the `src/components` folder with appropriate exports.

<Aside type="caution">Remember to export both individual functions and the default object (or creation factory function) to allow users to compose different components.</Aside>

Example (`../components/shiningDetector.js`):

```diff
+ import { IShiningDetector } from '../src/types.js'
+
+ export function isShining(shining: Record<string, number>, subject: string): number {
+   return shining[subject] ?? 0
+ }
+
+ export function createShiningDetector(): IShiningDetector {
+   return { isShining }
+ }
+
```

Note that exporting a `create*` factory function is not strictly needed but it helps to isolate initialization tasks.

### Step 4: Update the `create` method

Update the `create` method in `src/methods/create.ts` to use the component provided in the options, or create a new one using the object or factory function defined in the previous step.

If you added a field to the `Data` interface, also provide the initial value.

Example:

```diff

import { createShiningDetector } from '../components/shiningDetector.js'

export async function create({ schema, language, components }: CreateArguments): Promise<Orama> {

  // ...

  const orama = {
    data: {},
    caches: {},
    // ...
+   shiningDetector: components.shiningDetector ?? await createShiningDetector()
  } as Orama

  orama.data = {
    index: await orama.index.create(orama, schema),
    docs: await orama.documentsStore.create(orama),
    shining: { Paolo: 10 }
  }
```

### Step 5: Have fun!

Call your component's functions where appropriate!

<Aside>Yes, we know the example below is completely useless and silly. 😁</Aside>

Example:

```diff
export async function search(orama: Orama, params: SearchParams, language?: string): Promise<Results> {
+   console.log('Shining level', await orama.shiningDetector.isShining(params.term))

  // ...
```
